A Non-Blocking Recovery Algorithm for Causal Message Logging

نویسندگان

  • J. Roger Mitchell
  • Vijay K. Garg
چکیده

In the recovery of failed processes in a distributed program, causal logging schemes offer several benefits. These benefits include no rollback of unfailedprocesses and simple approaches to output commit. Unfortunately, previous approaches to the recovery of multiple simultaneous failures require that the distributed execution be blocked or that recovering processes coordinate. The latter requires assumptions which are not satisfatory. In this paper we present a solution that has neither of these drawbacks. Message logging is an important technique for recovering from failures in distributed programs. This technique logs the order in which messages are received. By assuming that receive ordering is the only source of non-determinism, execution is recoverable using this ordering. Pessimistic message logging [4, 11] forces a process to wait before sending any message while the message log is written to stable storage. Optimistic logging methods [9, 12, 13, 15] (and the similar sender based logging [8, 14]) assume failures are rare and therefore allow ordering information to be lost in a failure. (That is, a message is logged in the background while execution proceeds). Consequently, received messages and any sends that depend on them may not be recoverable. This may then require that unfailed processes roll back their execution as well. Causal message logging sends message receive ordering information with each message. This information includes receives and their causal history since the last send. The Manetho approach [6] uses this method. In family-based message logging (FBL) [2] causal history information for only K processes is included. This method then tolerates K simultaneous failures rather than all processes in the system (as with Manetho and the other logging methods.) The causal message logging approach offers advantages supported in part by a Virginia & Ernest Cockrell fellowship ysupported in part by the NSF Grants ECS-9414780, CCR-9520540, Texas Education Board Grant ARP-320, a General Motors Fellowship, and an IBM grant over the other message loggingschemes. It allows processes to execute without blocking (like optimistic logging) and never forces unfailed processes to roll back their execution (like pessimistic logging). Unfortunately, causal message loggingsuffers from complications associated with recovery not present in the other logging methods. One particular difficulty occurs when multiple processes fail simultaneously [7]. Solutions have been presented which require blocking unfailed processes or coordinating between recovering processes. Neither of these solutions is satisfactory. In this paper we present a solution without either of these drawbacks. We note that independently Alvisi, Rao, and Vin have also developed an algorithm for non-blocking recovery [3].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relative Overhead of Piggybacking in Causal Message Logging Protocols

Message logging protocols ensure that crashed processes make the same choices when re-executing nondeterministic events during recovery. Causal message logging protocols achieve this by piggybacking the results of these choices (called determinants) on the ambient message traffic. By doing so, these protocols do not create orphan processes nor introduce blocking in failure-free executions. To s...

متن کامل

New Causal Message Logging Protocol with Asynchronous Checkpointing for Distributed Systems

Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery....

متن کامل

Scalable Causal Message Logging for Wide-Area Environments

Causal message logging spread recovery information around the network in which the processes execute. This is an attractive property for wide area networks: it can be used to replicate processes that are otherwise inaccessible due to network partitions. However, current causal message logging protocols do not scale to thousands of processes. We describe the Hierarchical Causal Logging Protocol ...

متن کامل

The Cost of Recovery in Message Logging Protocols

ÐPast research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic, and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex trade-off when choosing a message logging protocol for fault to...

متن کامل

Design Patterns for Log-Based Rollback Recovery

Log-based rollback recovery builds on the ideas of checkpoint-based rollback recovery and improves the characteristics of the recovery process. The basic idea capture by the log-based rollback recovery techniques is an extension of the checkpoint idea. Only, instead of relying solely on checkpoints for recovering from the occurrence of an error, the system logs information about the non-determi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998